Book a Demo!
CoCalc Logo Icon
StoreFeaturesDocsShareSupportNewsAboutPoliciesSign UpSign In
debakarr
GitHub Repository: debakarr/machinelearning
Path: blob/master/Part 2 - Regression/Decision Tree Regression/[Python] Decision Tree Regression.ipynb
1009 views
Kernel: Python 3

Decision Tree Regression

from IPython.display import Image

Classification And Regression Trees(CART) is a term introduced by Leo Breiman to refer Decision Tree algorithm which is used to predict using classification or regression model.


Image('img/01.png')
Image in a Jupyter notebook
Image('img/02.png')
Image in a Jupyter notebook
Image('img/03.png')
Image in a Jupyter notebook

The algorithm split the data into several terminal leaves which denotes the average. Above we have two independent variables and one dependent variable. Depending on the value of two new independent variable we can predict the value of dependent variable with a more precise manner rather then the naive approach(where no matter what are the two new dependent variables are we will assign the value of average of all the points to the dependent variable corresponds to the two independent variable).

For exampe let's say we want to predict the dependent variable for two independent variable, X1 = 30, X2 = 100 (say).

The from the decision tree we can say that Y = -64.1 (as X1 < 20 => No, X2 < 170 => Yes and X1 < 40 = Yes)


Data Preprocessing

# Importing the libraries import numpy as np import matplotlib.pyplot as plt import pandas as pd from sklearn.tree import DecisionTreeRegressor %matplotlib inline plt.rcParams['figure.figsize'] = [14, 8] # Importing the dataset dataset = pd.read_csv('Position_Salaries.csv') X = dataset.iloc[:, 1:2].values y = dataset.iloc[:, 2].values

Fitting the Decision Tree Regression Model to the dataset

regressor = DecisionTreeRegressor(random_state = 42) regressor.fit(X, y)
DecisionTreeRegressor(criterion='mse', max_depth=None, max_features=None, max_leaf_nodes=None, min_impurity_decrease=0.0, min_impurity_split=None, min_samples_leaf=1, min_samples_split=2, min_weight_fraction_leaf=0.0, presort=False, random_state=42, splitter='best')

Predicting a new result

y_pred = regressor.predict(6.5) y_pred
array([ 150000.])

Visualising the Decision Tree Regression results

plt.scatter(X, y, color = 'red') plt.plot(X, regressor.predict(X), color = 'blue') plt.title('Truth or Bluff (Regression Model)') plt.xlabel('Position level') plt.ylabel('Salary') plt.show()
Image in a Jupyter notebook

Visualising the Decision Tree Regression results (for higher resolution and smoother curve)

X_grid = np.arange(min(X), max(X), 0.01) X_grid = X_grid.reshape((len(X_grid), 1)) plt.scatter(X, y, color = 'red') plt.plot(X_grid, regressor.predict(X_grid), color = 'blue') plt.title('Truth or Bluff (Regression Model)') plt.xlabel('Position level') plt.ylabel('Salary') plt.show()
Image in a Jupyter notebook

From the above graph it is obvious that we are getting an average value for each interval. Also the value of Salary for level from 5.5 and 6.5 is 150000.